remotes::install_github("nsgrantham/ggbraid")1 The task
In this take-home exercise, you are required to uncover the impact of COVID-19 as well as the global economic and political dynamic in 2022 on Singapore bi-lateral trade (i.e. Import, Export and Trade Balance) by using appropriate analytical visualisation techniques learned in Lesson 6: It’s About Time. Students are encouraged to apply appropriate interactive techniques to enhance user and data discovery experiences.
The following packages are used for this project.
Packages
- tidyverse:
- readxl: This package makes it easy to get data out of Excel and into R. It has no external dependencies, so it’s easy to install and use on all operating systems. It is designed to work with tabular data. The easiest way to install the latest released version from CRAN is to install the whole tidyverse. Note that you will still need to load readxl explicitly, because it is not a core tidyverse package loaded via
library(tidyverse) - tidyr: Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. If you ensure that your data is tidy, you’ll spend less time fighting with the tools and more time working on your analysis.
- lubridate: Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects. The easiest way to get lubridate is to install the whole tidyverse.
- readxl: This package makes it easy to get data out of Excel and into R. It has no external dependencies, so it’s easy to install and use on all operating systems. It is designed to work with tabular data. The easiest way to install the latest released version from CRAN is to install the whole tidyverse. Note that you will still need to load readxl explicitly, because it is not a core tidyverse package loaded via
- plotly: is an R package for creating interactive web-based graphs via the open source JavaScript graphing library
plotly.js. - d3scatter: is an HTML R widget for interactive scatter plots visualization. It is based on the htmlwidgets R package and on the d3.js javascript library.
- ggbraid: ggbraid provides a new stat,
stat_braid(), that extends the functionality ofgeom_ribbon()to correctly fill the area between two alternating lines (or steps) with two different colors. ggbraid also provides a geom,geom_braid(), that wrapsgeom_ribbon()and usesstat_braid()by default. You can install the development version of ggbraid from GitHub with:
ggplot2: A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
- TimeSeries: This R package offers novel time series visualisations. It is based on
ggplot2and offersgeoms and pre-packaged functions for easily creating any of the offered charts. - hrbrthemes: A compilation of extra ‘ggplot2’ themes, scales and utilities, including a spell check function plot label fields and an overall emphasis on typography.
- TimeSeries: This R package offers novel time series visualisations. It is based on
tmap: With the tmap package, thematic maps can be generated with great flexibility. The syntax for creating plots is similar to that of
ggplot2, but tailored to maps.CGPfunctions: Miscellaneous functions useful for teaching statistics as well as actually practicing the art. They typically are not new methods but rather wrappers around either base R or other packages. This package is used to create slope graph in this project.
Use the code chunk below to call the packages required for this project.
packages = c('readxl', 'datawizard', 'crosstalk', 'tidyr', 'lubridate','tidyverse', 'plotly', 'd3scatter','tidyquant', 'ggbraid', 'ggTimeSeries', 'CGPfunctions', 'tmap', 'ggplot2', 'hrbrthemes', 'ggiraph')
for(p in packages){
if(!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}2 Data Preparation
Merchandise Trade provided by Department of Statistics, Singapore (DOS) is used. Pay attention to the fact that the data is in xlsx format with several tabs. As a result, we will manage using functions from the readxl packages.
Step 1. Checking the number of sheets it contains To list all sheets in an excel spreadsheet, we utilize the readxl package’s function excel sheet.
excel_sheets("data/data.xlsx")[1] "Content" "T1" "T2"
Step 2. Importing data
The readxl package’s read_xlsx() function is used in the code chunk below to import the data worksheet of our data workbook into R.
T1 <- read_xlsx("data/data.xlsx", sheet = "T1")
T2 <- read_xlsx("data/data.xlsx", sheet = "T2")Step 3. Transpose the fat table to long table
Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather() when you notice that you have columns that are not variables. This function is under tidyr package.
T1 <- gather(T1, "MonthYear", "ImportValue", -`Data Series`)
T2 <- gather(T2, "MonthYear", "ExportValue", -`Data Series`)| Data Series | MonthYear | ImportValue |
|---|---|---|
| Total Merchandise Imports (Thousand Dollars) | 2022 Dec | 49869770 |
| America (Million Dollars) | 2022 Dec | 6901.5 |
| Asia (Million Dollars) | 2022 Dec | 33611.7 |
| Europe (Million Dollars) | 2022 Dec | 7541.8 |
| Oceania (Million Dollars) | 2022 Dec | 1399.9 |
| Data Series | MonthYear | ExportValue |
|---|---|---|
| Total Merchandise Exports (Thousand Dollars) | 2022 Dec | 55000084 |
| America (Million Dollars) | 2022 Dec | 6217.5 |
| Asia (Million Dollars) | 2022 Dec | 39734.8 |
| Europe (Million Dollars) | 2022 Dec | 4924.4 |
| Oceania (Million Dollars) | 2022 Dec | 3034.8 |
Step 4. Convert MonthYear column to date format
For our time series analysis, we then convert the date time to the date format using the ym function from the lubridate package.
T1$`MonthYear` <- ym(T1$`MonthYear`)
T2$`MonthYear` <- ym(T2$`MonthYear`)
# Convert ImportValue column to numeric format
T1$`ImportValue` <- as.numeric(T1$`ImportValue`)
T2$`ExportValue` <- as.numeric(T2$`ExportValue`)Step 5. Separate region and country
Notice that the country column contains both country and region-level information. To prevent misleading analysis, we have created the field Level to distinguish between the Region and Country levels.
# =================== Import =================== #
Region <- T1 %>%
filter(grepl('Million', `Data Series`)) %>%
rename("Region" = "ImportValue")
Country <- T1 %>%
filter(grepl('Thousand', `Data Series`)) %>%
rename("Country" = "ImportValue")
Import <- full_join(Region, Country, by = join_by(`Data Series`, `MonthYear`))
Import <- gather(Import , "Level", "ImportValue", -`Data Series`, -`MonthYear`)
# =================== Export =================== #
Region <- T2 %>%
filter(grepl('Million', `Data Series`)) %>%
rename("Region" = "ExportValue")
Country <- T2 %>%
filter(grepl('Thousand', `Data Series`)) %>%
rename("Country" = "ExportValue")
Export <- full_join(Region, Country, by = join_by(`Data Series`, `MonthYear`))
Export <- gather(Export , "Level", "ExportValue", -`Data Series`, -`MonthYear`)Step 6. Filter year from 2016 and rename column
Because we are focusing on the impact of the Covid epidemic event, we only filter data from 2016 to 2022.
Import <- Import %>%
filter(`MonthYear`> as.Date("2015-12-01")) %>%
rename(`Country` = `Data Series`)
Export <- Export %>%
filter(`MonthYear`> as.Date("2015-12-01")) %>%
rename(`Country` = `Data Series`)Step 7. Merge Import and Export into one table
wide <- full_join(Import, Export, by = join_by(`Country`, `MonthYear`,`Level`))
wide <- wide %>%
mutate("Diff" = ImportValue-ExportValue) %>%
mutate("Total" = ImportValue+ExportValue)
wide$`Country` <- str_replace(wide$`Country`, "Mainland China", "China")
wide$`Country` <- str_replace_all(wide$`Country`, " \\(|Thousand Dollars|\\)", "")
wide$`Country` <- str_replace_all(wide$`Country`, " \\(|Million Dollars|\\)", "")
long <- gather(wide , "Type", "Value", -`Country`, -`MonthYear`,-`Level`)And now we have both a wide table and a long table prepared for analysis.
Table Wide : Merchandise Imports/Export By Region/Market, Monthly
| Country | MonthYear | Level | ImportValue | ExportValue | Diff | Total |
|---|---|---|---|---|---|---|
| America | 2022-12-01 | Region | 6901.5 | 6217.5 | 684.0 | 13119.0 |
| Asia | 2022-12-01 | Region | 33611.7 | 39734.8 | -6123.1 | 73346.5 |
| Europe | 2022-12-01 | Region | 7541.8 | 4924.4 | 2617.4 | 12466.2 |
| Oceania | 2022-12-01 | Region | 1399.9 | 3034.8 | -1634.9 | 4434.7 |
| Africa | 2022-12-01 | Region | 414.9 | 1088.6 | -673.7 | 1503.5 |
Table Long : Merchandise Imports/Export By Region/Market, Monthly
| Country | MonthYear | Level | Type | Value |
|---|---|---|---|---|
| America | 2022-12-01 | Region | ImportValue | 6901.5 |
| Asia | 2022-12-01 | Region | ImportValue | 33611.7 |
| Europe | 2022-12-01 | Region | ImportValue | 7541.8 |
| Oceania | 2022-12-01 | Region | ImportValue | 1399.9 |
| Africa | 2022-12-01 | Region | ImportValue | 414.9 |
3 Visualizations
3.1 Uncover the Singapore Bi-lateral trend over Covid outbreak
Step 1. Prepare data for line plot of Singapore
Code
singapore <- wide %>%
subset((Country == "Total Merchandise Imports"|Country == "Total Merchandise Exports")
& Level == "Country") %>%
group_by(MonthYear)
singaporeribbon <- singapore %>%
select(`Country`, `MonthYear`, `Level`, `ImportValue`, `ExportValue`) %>%
gather("Type", "Value", -`Country`, -`MonthYear`, -`Level`) %>%
drop_na()Step 2. Prepare data for line plot of Singapore
Code
ggplot() +
geom_line(aes(`MonthYear`, `Value`, linetype = `Type`),
data = singaporeribbon,
show.legend = FALSE) +
geom_braid(aes(`MonthYear`,
ymin = `ImportValue`,
ymax = `ExportValue`,
fill = `ImportValue`>`ExportValue`),
show.legend = FALSE,
data = singapore,
alpha = 0.6,
method = 'line') +
labs(title = "Singapore Bi-lateral Trade Trend",
subtitle = "2016-2022",
x = 'Month Year', y = 'Trade Value'
) +
annotate("rect",
xmin = as.Date("2020-01-01"),
xmax = as.Date("2022-12-01"),
ymin = 0,
ymax = 65000000,
alpha = .1,
fill = "yellow") +
annotate("text",
label = "Covid outbreak",
x = as.Date("2021-06-01"),
y=25000000) 
Step 3. Prepare data for line plot of China
Code
braid <- wide %>%
select(`Country`, `MonthYear`, `Level`, `ImportValue`, `ExportValue`) %>%
drop_na() %>%
subset(Country == "China")
ribbon <- gather(braid , "Type", "Value", -`Country`, -`MonthYear`, -`Level`)Step 2. Plot line plot of China
Code
ggplot() +
geom_line(aes(`MonthYear`, `Value`, linetype = `Type`),
data = ribbon,
show.legend = FALSE) +
geom_braid(aes(`MonthYear`,
ymin = `ImportValue`,
ymax = `ExportValue`,
fill = `ImportValue`>`ExportValue`),
data = braid,
alpha = 0.6,
method = 'line')+
annotate("rect",
xmin = as.Date("2020-01-01"),
xmax = as.Date("2022-12-01"),
ymin = 0,
ymax = 10000000,
alpha = .1,
fill = "yellow")
Insights from line plot
- Mainland China is range with a High Import - High Export spectrum. From 2020 to 2021, there was a significant increase in value, particularly export value.
- Taiwan import rate has continuously grown but we can observed significant jump of import value in 2021 after COVID.
3.2 Scatter plot Dashboard
Step 1. Create scatter plot with plotly
Code
# Plot scatter plot
fig <- wide %>%
plot_ly(
x = ~`ImportValue`,
y = ~`ExportValue`,
color = ~`Country`,
frame = ~as.character(`MonthYear`, format = "%Y-%m"),
size = ~`Total`,
sizes = c(10,1000),
text= ~paste("Country:",`Country`,
"\nImport Value:", `ImportValue`, " Thousand Dollars",
"\nExport Value:", `ExportValue`, " Thousand Dollars",
"\nMonth Year:", `MonthYear`),
hoverinfo = "text",
type = 'scatter',
mode = 'markers'
)Step 2. Set up layout
Code
# Create the diagonal line
dline <- function(color = "steelblue") {
list(
type = "line",
yref = "paper",
xref = "paper",
y0 = 0, y1 = 1,
x0 = 0, x1 = 1,
line = list(color = color, dash="dot")
)
}
# Setup layout
fig <- fig %>%
layout(title = list(text="Singapore bi-lateral trade volume"),
subtitle = "2016-2022",
hoverlabel = list(align = "left"),
shapes = dline(),
legend = list(orientation = "h", y = 1, x = 0),
showlegend = FALSE,
xaxis = list(title="Import Value", range = list(0, 10000000)),
yaxis = list(title="Export Value", range = list(0, 10000000)),
width=650,
height=650
)Step 3. Set up animation
Code
fig <- fig %>%
animation_opts(
500, easing = "linear", redraw = FALSE
)
# Animation slider
fig <- fig %>% animation_slider(
currentvalue = list(prefix = "MONTH-YEAR :", font = list(color="red"))
)
figInsights from scatter plot
- Mainland China is range with a High Import - High Export spectrum. From 2020 to 2021, there was a significant increase in value, particularly export value.
- Taiwan import rate has continuously grown but we can observed significant jump of import value in 2021 after COVID.
3.3 Slope Graph
Step 1. Prepare data for slope graph
Code
# Import
slopeimport <- wide %>%
subset(Level == "Region") %>%
mutate(month = month(MonthYear)) %>%
mutate(year = year(MonthYear)) %>%
drop_na() %>%
group_by(Country, year) %>%
summarise(sumyear = sum(ImportValue)) %>%
mutate(Year = factor(year)) %>%
arrange(`sumyear`)
#Export
slopeexport <- wide %>%
subset(Level == "Region") %>%
mutate(month = month(MonthYear)) %>%
mutate(year = year(MonthYear)) %>%
drop_na() %>%
group_by(Country, year) %>%
summarise(sumyear = sum(ExportValue)) %>%
mutate(Year = factor(year)) %>%
arrange(`sumyear`)Step 2. Create slope plot
Code
# Create slope plot for import
p <- newggslopegraph(dataframe = slopeimport,
Times = `Year`,
Measurement = `sumyear`,
Grouping = `Country`,
Title = "Total Import per Year by Region",
SubTitle = "2016-2022",
Caption = NULL)
p + annotate("rect",
xmin = "2020",
xmax = "2022",
ymin = -1,
ymax = 600000,
alpha = .1,
fill = "yellow")
# Create slope plot for export
p2 <- newggslopegraph(dataframe = slopeexport,
Times = `Year`,
Measurement = `sumyear`,
Grouping = `Country`,
Title = "Total Export per Year by Region",
SubTitle = "2016-2022",
Caption = NULL)
p2 + annotate("rect",
xmin = "2020",
xmax = "2022",
ymin = -1,
ymax = 600000,
alpha = .1,
fill = "yellow")

Insights from slope graph
- Mainland China is range with a High Import - High Export spectrum. From 2020 to 2021, there was a significant increase in value, particularly export value.
- Taiwan import rate has continuously grown but we can observed significant jump of import value in 2021 after COVID.
3.4 Heat Map
Step 1. Prepare data by selecting top difference between import and export
Code
heatmap <- wide %>%
drop_na() %>%
group_by(Country) %>%
mutate(totaldiff = sum(Diff)) %>%
arrange(totaldiff) %>%
subset(-55000000>totaldiff|totaldiff>55000000)Step 2. Plot heatmap
Code
p <- heatmap %>%
ggplot(aes(x = MonthYear, y = reorder(Country,totaldiff), fill= Diff)) +
geom_tile_interactive(tooltip = c(paste( "Country:", heatmap$Country,
"\n Import:", heatmap$ImportValue,
"\n Export:", heatmap$ExportValue,
"\n Balance:" , heatmap$Diff,
"\n Month:", heatmap$MonthYear))) +
scale_fill_distiller(palette = "RdPu") +
theme_ipsum() +
geom_tile() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title = "Trade balance", x="", y="")
girafe(
ggobj = p,
width_svg = 10,
height_svg = 10 * 0.618
)Insights from heatmap
- Mainland China is range with a High Import - High Export spectrum. From 2020 to 2021, there was a significant increase in value, particularly export value.
- Taiwan import rate has continuously grown but we can observed significant jump of import value in 2021 after COVID.
3.5 Cycle Plot
Step 1: Deriving month and year fields
Code
cycle <- wide
cycle$month <- month(cycle$`MonthYear`)
cycle$year <- year(cycle$`MonthYear`)Step 2: Extracting the target country
Code
cycle <- cycle %>%
subset(`Country`== "China") %>%
drop_na()Step 3: Computing year average import by month
Code
x <- cycle %>%
select(Country, month, year, Total) %>%
group_by(month) %>%
summarise(avg = mean(Total))
hline.data <- cycle %>%
group_by(`month`) %>%
mutate(avgvalue = mean(`Total`))Step 4: Plotting the cycle plot
Code
ggplot() +
geom_line(data=cycle,
aes(x=year,
y=Diff,
group=month),
colour="black") +
geom_hline(aes(yintercept=avgvalue),
data=hline.data,
linetype=6,
colour="red",
linewidth=0.5) +
facet_grid(~month) +
labs(axis.text.x = element_blank(),
title = "xxxxxxxxxxxxxxxxxxxxxxxxx") +
xlab("") +
ylab("Import Value") +
theme(plot.title = element_text(size=22),
axis.text.x = element_text(size = 10, angle = 90),
axis.text.y = element_text(size = 10),
strip.text = element_text(size = 10))
Pay Attention
Using callouts is an effective way to highlight content that your reader give special consideration or attention.
3.6 Choropleth Map
Step 1. Prepare data for Choropleth map
Code
data("World")
map <- World %>%
select(iso_a3, name, sovereignt, geometry)
map$name <- as.character(map$name)
map$sovereignt <- as.character(map$sovereignt)Step 2. Create animated map
Code
data_map_area <- map %>%
full_join(wide, by = c('sovereignt' = 'Country')) %>%
drop_na()
tmap_mode("view")
choropleth <- tm_shape(data_map_area) +
tm_polygons("Total") +
tm_facets(along = "MonthYear", free.coords = FALSE)
tmap_animation(choropleth , filename = "choropleth.gif", delay = 25)
4 Interactive Dashboard
Code
# Prepare data for dashboard
line <- long %>%
subset(Type == "ImportValue"|Type == "ExportValue")
# Building interactive filters
d <- highlight_key(line)
filter_tools <- htmltools::div(
filter_select(id = "filter",
label = "Select Country",
sharedData = d,
group = ~Country,
multiple=FALSE),
filter_slider(id = "period",
label = "Select period",
sharedData = d,
column = ~year(MonthYear),
width = "100%"),
filter_slider(id = "value",
label = "Select Value",
sharedData = d,
column = ~Value,
width = "100%"),
filter_checkbox(id = "variable",
label = "Select variable",
sharedData = d,
group = ~Type,
inline = FALSE))
vline <- function(x = 0, color = "steelblue") {
list(
type = "line",
y0 = 0, y1 = 1,
yref = "paper",
x0 = x, x1 = x,
line = list(color = color, dash="dot")
)
}
# plotting interactive scatter plot using plotly
p <- plot_ly(data=d,
type= "scatter",
mode= "line",
x= ~MonthYear,
y= ~Value,
color= ~Type,
colors= "Accent",
# fill = 'tonexty',
text= ~paste("Country:",`Country`,
"\nMonth Year:", `MonthYear`,
"\nType:",`Type`)) %>%
layout(title = list(text="<b>Import/Export trend by country</b>"),
hoverlabel = list(align = "left"),
legend = list(orientation = "h", y = 1, x = 0),
shapes = vline("2020"),
xaxis = list(title="Month Year"),
yaxis = list(title="Value"))
gg <- highlight(p, "plotly_selected")
# Using crosstalk bscols to put all 3 elements (filter, scatter plot, datatable) together.
crosstalk::bscols(filter_tools,gg,DT::datatable(d, class= "display",
filter=list(position="top", clear=FALSE),
options=list(pageLength = 10,scrollY = TRUE,
iDisplayLength = 25)),
widths = c(4, 8, 12),
annotations = list(caption = "Data from Department of Statistics, Singapore (DOS)"))Code
function filter_default() {
document.getElementById("filter").getElementsByClassName("selectized")
[0].selectize.setValue("China", false);
}
window.onload = filter_default;Code
# Building interactive filters
# d <- highlight_key(ribbon)
# # d2 <- highlight_key(braid)
#
# filter_tools <- htmltools::div(
# filter_select(id = "country",
# label = "Select Country",
# sharedData = d,
# group = ~Country,
# multiple=FALSE),
#
# filter_slider(id = "period",
# label = "Select period",
# sharedData = d,
# column = ~year(MonthYear),
# width = "100%"))
#
# # plotting interactive scatter plot using plotly
# p <- ggplot() +
# geom_line(aes(`MonthYear`, `Value`, linetype = `Type`), data = ribbon)
# # +
# # geom_braid(aes(`MonthYear`,
# # ymin = `ImportValue`,
# # ymax = `ExportValue`,
# # fill = `ImportValue`>`ExportValue`),
# # data = braid, alpha = 0.6) +
# # guides(linetype = "none", fill = "none")
#
#
# gg <- highlight(p, "plotly_selected")
#
# # Using crosstalk bscols to put all 3 elements (filter, scatter plot, datatable) together.
# crosstalk::bscols(filter_tools, gg, widths = c(4, 8))